13 research outputs found

    Tight Hardness Results for Maximum Weight Rectangles

    Get PDF
    Given nn weighted points (positive or negative) in dd dimensions, what is the axis-aligned box which maximizes the total weight of the points it contains? The best known algorithm for this problem is based on a reduction to a related problem, the Weighted Depth problem [T. M. Chan, FOCS'13], and runs in time O(nd)O(n^d). It was conjectured [Barbay et al., CCCG'13] that this runtime is tight up to subpolynomial factors. We answer this conjecture affirmatively by providing a matching conditional lower bound. We also provide conditional lower bounds for the special case when points are arranged in a grid (a well studied problem known as Maximum Subarray problem) as well as for other related problems. All our lower bounds are based on assumptions that the best known algorithms for the All-Pairs Shortest Paths problem (APSP) and for the Max-Weight k-Clique problem in edge-weighted graphs are essentially optimal

    Testing properties of Ising models

    No full text
    Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 99-102).Given samples from an unknown multivariate distribution p, is it possible to distinguish whether p is the product of its marginals versus p being [epsilon]-far from every product distribution? Similarly, is it possible to distinguish whether p equals a given distribution q versus p and q being [epsilon]-far from each other? These problems of testing independence and goodness-of- fit have received enormous attention in statistics, information theory, and theoretical computer science, with sample-optimal algorithms known in several interesting regimes of parameters [14, 15, 17, 18, 20]. Unfortunately, it has also been understood that these problems become intractable in large dimensions, necessitating exponential sample complexity. Motivated by the exponential lower bounds for general distributions as well as the ubiquity of Markov Random Fields (MRFs) in the modeling of high-dimensional distributions, we study distribution testing on structured multivariate distributions, and in particular the prototypical example of MRFs: the Ising Model. We demonstrate that, in this structured setting, we can avoid the curse of dimensionality, obtaining sample and time efficient testers for independence and goodness-of-fit which yield a sample complexity of poly(n)=[epsilon]2 on n-node Ising models. Along the way, we develop new tools for establishing concentration of functions of the Ising model, using the exchangeable pairs framework developed by Chatterjee [27], and improving upon this framework. In particular, we prove tighter concentration results for multi-linear functions of the Ising model in the high-temperature regime. We also prove a lower bound of n=[epsilon] on the sample complexity required for testing uniformity and independence of n-node Ising models.by Sai Nishanth Dikkala.S.M

    Statistical inference from dependent data : networks and Markov chains

    No full text
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020Cataloged from the official PDF of thesis.Includes bibliographical references (pages 259-270).In recent decades, the study of high-dimensional probability has taken centerstage within many research communities including Computer Science, Statistics and Machine Learning. Very often, due to the process according to which data is collected, the samples in a dataset have implicit correlations amongst them. Such correlations are commonly ignored as a first approximation when trying to analyze statistical and computational aspects of an inference task. In this thesis, we explore how to model such dependences between samples using structured high-dimensional distributions which result from imposing a Markovian property on the joint distribution of the data, namely Markov Random Fields (MRFs) and Markov chains. On MRFs, we explore a quantification for the amount of dependence and we strengthen previously known measure concentration results under a certain weak dependence condition on an MRF called the high-temperature regime. We then go on to apply our novel measure concentration bounds to improve the accuracy of samples computed according to a certain Markov Chain Monte Carlo procedure. We then show how to extend some classical results from statistical learning theory on PAC-learnability and uniform convergence to training data which is dependent under the high temperature condition. Then, we explore the task of regression on data which is dependent according to an MRF under a stronger amount of dependence than is allowed by the high-temperature condition. We then shift our focus to Markov chains where we explore the question of testing whether a certain trajectory we observe corresponds to a chain P or not. We discuss what is a reasonable formulation of this problem and provide a tester which works without observing a trajectory whose length contains multiplicative factors of the mixing or covering time of the chain P. We finally conclude with some broad directions for further research on statistical inference under data dependence.by Sai Nishanth Dikkala.Ph. D.Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienc
    corecore